Intosalmi et al. BMC Bioinformatics 201 1, 12:252 
http://www.biomedcentral.eom/1 471 -2 1 05/1 2/252 



Bioinformatics 



RESEARCH ARTICLE Open Access 



Computational study of noise in a large signal 
transduction network 

Jukka Intosalmi 1,2 *, Tiina Manninen 2 , Keijo Ruohonen 1 and Marja-Leena Linne 2 * 



Abstract 

Background: Biochemical systems are inherently noisy due to the discrete reaction events that occur in a random 
manner. Although noise is often perceived as a disturbing factor, the system might actually benefit from it. In 
order to understand the role of noise better, its quality must be studied in a quantitative manner. Computational 
analysis and modeling play an essential role in this demanding endeavor. 

Results: We implemented a large nonlinear signal transduction network combining protein kinase C, mitogen- 
activated protein kinase, phospholipase A2, and f5 isoform of phospholipase C networks. We simulated the network 
in 300 different cellular volumes using the exact Gillespie stochastic simulation algorithm and analyzed the results 
in both the time and frequency domain. In order to perform simulations in a reasonable time, we used modern 
parallel computing techniques. The analysis revealed that time and frequency domain characteristics depend on 
the system volume. The simulation results also indicated that there are several kinds of noise processes in the 
network, all of them representing different kinds of low-frequency fluctuations. In the simulations, the power of 
noise decreased on all frequencies when the system volume was increased. 

Conclusions: We concluded that basic frequency domain techniques can be applied to the analysis of simulation 
results produced by the Gillespie stochastic simulation algorithm. This approach is suited not only to the study of 
fluctuations but also to the study of pure noise processes. Noise seems to have an important role in biochemical 
systems and its properties can be numerically studied by simulating the reacting system in different cellular 
volumes. Parallel computing techniques make it possible to run massive simulations in hundreds of volumes and, 
as a result, accurate statistics can be obtained from computational studies. 



Background 

Noise plays an essential role in nearly all biochemical 
systems. It derives from reaction events that are discrete 
and occur at random times. The structure of a particular 
biochemical reaction network (BRN) determines the way 
the system evolves and defines the quality of noise 
accordingly. Consequently, there exist several types of 
noise processes which occur in these systems. Noise 
induced effects can have both a quantitative and qualita- 
tive impact on the behavior of a biochemical system [1]. 
Knowing the characteristics of noise processes would 
help develop better models and understand the underly- 
ing principles of the biological phenomena better. 
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The effects of noise in biochemical systems have been 
studied to some extent but the understanding of noise is 
still rather deficient. A common misconception is that 
noise is always a disturbing factor in a biological system. 
Contrary to this popular belief, noise might in some 
cases be the factor which keeps the system functioning 
properly [2]. Noise can, for example, make the system 
more robust to external perturbations or it might lead to 
a specific behavior like noise-induced bistability with 
oscillations [3]. The importance of noise has been 
emphasized especially when the focus has been on the 
signaling networks related to memory and learning or 
gene regulatory networks (see e.g. [4-6]). The frequency 
content of noise and its relation to the structure of gene 
regulatory networks have also been studied recently 
[7-9]. Computational methods and models are outstand- 
ingly useful when stochastic effects and noise in BRNs 
are studied. There exist several stochastic modeling 
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formalisms that enable the time-evolution of BRNs to be 
studied in theory and using computer simulations [10]. 
In the field of computational systems biology, the exact 
Gillespie stochastic simulation algorithm (SSA) [11,12] 
and its several variants are the most commonly used 
stochastic simulation procedures. The drawback of the 
Gillespie SSA is the computational burden that increases 
when the number of interacting chemical particles in the 
system increases. Several approximative simulation 
approaches have been developed to decrease computing 
times (see e.g. [13-15]). The conditions in which the 
approximations are valid are, however, often hard to spe- 
cify and this makes the selection of the simulation 
method a demanding task. The computing times of exact 
simulation procedures, such as the Gillespie SSA, can 
also be decreased by applying parallel computing (see e.g. 
[16]). This approach is especially attractive if, for exam- 
ple, the statistical characteristics of a biochemical process 
need to be estimated via simulation. In addition to simu- 
lation procedures, there exist also non-simulative 
approaches which can be used, for example, to numeri- 
cally estimate noise levels (see e.g. [17]). 

Besides the stochastic modeling and simulation of BRNs, 
computational methods are invaluable in the analysis of 
biochemical data. The data, obtained from time-series 
simulations or from laboratory experiments, can be 
numerically studied both in the time and frequency 
domain. Out of these two, the time domain analysis is the 
traditional approach. Typical time domain statistics are 
the mean, variance, autocorrelation, etc. which can be 
used to characterize the behavior of time-series. Frequency 
domain analysis, often used by engineers and physicists, 
provides other kind of information about the system. 
Using the frequency domain approach it is possible 
to decompose a biochemical signal into its frequency 
components and to study the magnitude of fluctuations at 
different frequencies. Fluctuations, both random and 
deterministic, are important in the functioning of biologi- 
cal systems. Even simple BRNs can be selectively respon- 
sive to specific frequency ranges [18,19]. The importance 
of periodic changes in chemical concentrations being 
widely known, it is surprising to notice that most simula- 
tion studies do not provide even a rough survey of fre- 
quency domain behavior. Some studies present analytical 
results for the signal processing properties of BRNs (see 
e.g. [18,19]) and the frequency domain characterization of 
biochemical noise (see e.g. [8,9,20]). These approaches, 
however, are often suitable only for linear or small net- 
works, require an unbearable amount of calculations, or 
have other restrictions. 

In this study, we utilize a straightforward numerical 
approach to explore noise in a biologically realistic BRN 
using simulated data. We implement a large nonlinear 
signal transduction network combining protein kinase C 



(PKC), mitogen-activated protein kinase (MAPK), phos- 
pholipase A2 (PLA2), and /? isoform of phospholipase C 
(PLC/3) networks. This BRN consists of 66 chemical spe- 
cies and 110 one-way reactions. In general, stochastic 
simulation of large networks of this kind is challenging 
and thus several technical aspects have had to be taken 
into account when implementing the network. The net- 
work is originally published by Bhalla and Iyengar in 
1999 [21] and its parts have been studied to some extent 
using both deterministic and stochastic modeling meth- 
ods [15,21-26]. In this study, we perform massive Monte 
Carlo simulations for the large network by applying 
parallel computing. As a stochastic simulation method 
we use the exact Gillespie SSA. We run simulations alto- 
gether in 300 different cellular volumes and compute the 
time and frequency domain characteristics of the noise 
processes for each volume. This kind of approach pro- 
vides us with an overall picture of the noise in the system 
as a function of system volume. We show how basic 
frequency domain methods can be applied and what 
advantages they have compared to the time domain 
methods. 

Methods 

Stochastic modeling of BRNs 

BRNs can be modeled using numerous different formal- 
isms. To the modeler, a biochemical system can be per- 
ceived as a container full of particles that have certain 
sizes and velocities. When these particles (chemical 
species) collide, they react with some probability and pro- 
duce other species [27]. The well-established theory of 
molecular dynamics describes how these chemical reac- 
tions occur at the molecular level and, in principle, we 
are capable of describing the dynamics of reacting species 
in detail [27]. In real systems, however, the amount of 
particles is large and it is impossible to track each and 
every molecule. Based on the theory of stochastic chemi- 
cal kinetics, these systems can often be assumed to be 
well-stirred. This means that the particles are uniformly 
distributed over the volume and, in order to understand 
the time-evolution of the system, we need to keep track 
only of the numbers of particles of each species [27]. 
Gillespie has done pioneering work in describing the 
time-evolution of a well-stirred chemical system in terms 
of continuous-time discrete-state Markov processes 
[11,12]. He has also developed the formalism which 
enables us to simulate the Markov model as a straightfor- 
ward computer algorithm, nowadays known as the Gille- 
spie SSA. By means of the simulation algorithm we are 
able to generate realizations of the underlying stochastic 
process. A sufficient number of independent realizations 
can then be used to compute accurate statistical charac- 
teristics describing the process [28]. In most of the cases, 
it is impossible to obtain these characteristics analytically 
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and, thus, simulation algorithms like the Gillespie SSA 
provide us with invaluable tools. In the following, we 
briefly recapitulate the theory behind the continuous- 
time discrete-state Markov model and Gillespie SSA. 

Stochastic processes having the so-called Markov 
property (i.e. Markov processes) are by far the most 
important in physics and chemistry [29]. The Markov 
property states that the future behavior of the process 
depends only on the current state of the system. In the 
context of biochemical systems, this assumption can 
often (but not always) be accepted and Markov pro- 
cesses provide us with a well-established modeling form- 
alism. In order to construct a Markov model for a 
biochemical system, we need to introduce some termi- 
nology. Let us consider a biochemical system consisting 
of n chemical species X h i = 1, n, and m reactions Ry, 
/ = 1, m, and let X(t) be a stochastic process describ- 
ing the time-evolution of the system. Each reaction R, in 
the system can be characterized by a propensity function 
«,(X) so that cij(X)At gives the probability that the reac- 
tion R ; will occur during the finite time interval At [11]. 
The propensity functions depend only on the current 
state of the system and thus the Markov property is 
satisfied. With each reaction event we associate the so- 
called stoichiometric vector v», so that when the /th 
reaction occurs, the state of the system is updated by X 
+ v,-. In addition, we assume that the initial state of the 
system X(t), t = 0 is known. 

Using the notation above, the system can be fully 
characterized by a continuous-time discrete-state Mar- 
kov process. By denoting the probability that the system 
is in the state x at time t given the system is in the state 
x 0 at time t 0 by p{x, t|x 0 , t 0 ) and assuming that only one 
reaction can occur during dt, we can write 

p(x,t + dt\x 0 , to) = 

( ~ \ 

p(x, t|x 0/ 1 0 ) 1 - y~] aj{x)dt + 



(1) 



p(x - Vj, t|x 0 , £ 0 )flj(x - Vj)dt. 

Consequently, the time evolution of the probabilities 
can be described by a set of coupled differential equa- 
tions which can be written in the form 



9p(x, t|x 0 , t 0 ) 
dt 



= y^[«j( x - V ;)P( X - Vj, t|x 0 , to) 

(2) 

-o,(x)p(x, t|x 0/ 1 0 )], 



where p(x, t|x 0 , t 0 ), <Zj, and v y - are as described above 
[27]. This equation is called the chemical master equa- 
tion (CME). 



Based on the formalism described above, we can con- 
struct the CME for any BRN of interest. The problem is 
that the CME is analytically and even numerically 
intractable. Although the solution of CME can in some 
cases be solved or approximated (using e.g. the finite 
state projection algorithm [30]), the numerical simula- 
tion of the underlying Markov process is often practical. 
The Gillespie SSA [11,12] is the most popular procedure 
for this purpose. It can be derived from the CME with- 
out additional assumptions or approximations and is 
exact in that sense. A detailed derivation of the Gillespie 
SSA can be found in the references [11,12,27]. 

The Gillespie SSA has proven to be useful in several bio- 
chemical simulation studies, ranging from studies of gene 
expression to stochastic ion channel dynamics (see e.g. 
[31,32]). In some cases, however, the algorithm becomes 
computationally heavy. This is the case especially if the 
size of the system is large (i.e. the number of particles in 
the system is large) and reactions occur more frequently. 
In such situations, we have to consider approximations 
which are usually based on the time-discretization of the 
continuous-time process [27]. In this study, we simulate a 
reaction network in which the numbers of molecules in 
some chemical species are relatively small and thus the 
approximations are not valid. Consequently, our simula- 
tions are carried out using the exact Gillespie SSA. To be 
able to run massive simulations in a reasonable time, we 
apply parallel computing techniques. 

Deterministic modeling of BRNs 

In the previous, we have described how biochemical sys- 
tems containing only small numbers of molecules can be 
modeled in detail. Sometimes, however, random effects 
may be neglected and simpler, deterministic models can 
be used. When large chemical systems which contain 
huge numbers of molecules are concerned, random fluc- 
tuations tend to average out and the time-evolution of 
the system can be modeled using a continuous-time con- 
tinuous-state deterministic ordinary differential equation 
(ODE) model. The traditional ODE model is based on 
the law of mass action, and like Gillespie [27] has shown, 
the model is asymptotically equivalent to the stochastic 
model when the volume of system is increased. Accord- 
ing to the law of mass action the dynamics of a chemi- 
cally reacting system can be described by the equation 



dX{t) 
dt 



Su, 



(3) 



where S is the stoichiometric matrix containing the 
stoichiometric vectors as columns and the state-depen- 
dent vector u describes the reaction rates. In this study, 
the deterministic ODE model is used to determine the 
deterministic steady-state of the system. 
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Analyzing simulation results in time domain 

Simulation results are traditionally analyzed in the time- 
domain by computing the time-dependent sample mean 
and sample standard deviation of a stochastic process 
(see e.g. [15]). These characteristics are indeed useful as 
we are dealing with stochastic processes related to bio- 
chemical applications. Another useful statistical number 
is the coefficient of variation which we use in this work. 
The coefficient of variation can be computed using the 
formula 



where o{i) is the (sample) standard deviation of the 
process and fi(t) is the sample mean. If the stochastic 
process is stationary (i.e. its statistical properties do not 
change in time), we can leave the time variable t out. 
The coefficient of variation provides us with the infor- 
mation of how strong the noise is compared to the 
mean level of the signal. In addition to the characteris- 
tics mentioned above it is sometimes useful to study the 
distribution of the process. Similar to the estimates for 
the mean, variance, and coefficient of variation, the dis- 
tribution of the process can also be approximated using 
a large number of independent realizations. The 
approximated distribution can then be illustrated for 
example using histograms. 



requirement is difficult to fulfill when realizations gener- 
ated by the Gillespie SSA are analyzed. The realizations 
often contain rapid changes and thus the sampling fre- 
quency would have to be unreasonably high. Based on 
our numerical tests, however, the fastest changes usually 
do not seem to have a lot of power and, as an approxi- 
mation, we can content ourselves with a lower sampling 
frequency. Pseudocode for our sampling algorithm is 
given in Figure 1 and the operation of the algorithm is 
exemplified in Figure 2. The algorithm enables us to 
sample the signal so that the most rapid changes are 
neglected. 

After sampling, the signal is often down-sampled by 
some factor in order to adjust the frequency range of the 
frequency domain representation to the desired level. By 
altering the sampling frequency and the length of the 
time window (which is used to define a finite sequence 
from the infinite signal), it is possible to extract different 
kind of information from the signal. As biochemical sys- 
tems often operate on various time scales, it is natural to 
pay attention to the selection of frequency range. If low 
frequencies are of interest, down-sampling of the signal 
is required. Before down-sampling, the signal must be 
low-pass filtered. Otherwise the high frequency fluctua- 
tions will aliase on the other frequencies in the frequency 
domain representation (for details see e.g. [33]). The fil- 
tering can be carried out using any available low-pass fil- 
ter. The MATLAB* function 'decimate' practically 



Analyzing simulation results in frequency domain 

Although the time domain analysis often provides 
important information about the biochemical system of 
interest, it still gives quite a limited insight into the 
(often non-linear) system. A broader view can be 
obtained by combining the time domain analysis with 
the frequency domain analysis. This approach provides 
us with information about the fluctuations typical for 
the particular system and the quality of noise arising 
from molecular interactions in general. In the following, 
we discuss frequency domain analysis, define terminol- 
ogy, and present a straightforward way of obtaining a 
rough approximation of the frequency domain behavior 
of a biochemical system through numerical frequency 
domain analysis. 

The starting point for numerical frequency domain 
analysis is sampling. This means that a continuous-time 
signal c(f), °° < t < °°, is sampled at discrete time points 
nAt, n = 0, + 1, ± 2, and, as a result, we have a dis- 
crete signal c[n]. The choice of the time step At (s) 
determines how frequently the signal is sampled. The 
reciprocal of the time step is called the sampling fre- 
quency (denoted by F) and, in order to capture all 
details of the original signal, it should be twice as much 
as the fastest oscillation in the signal [33]. However, this 



1 


Initialize a vector for samples 


2 


y = [s/i,2/2, • --,2/jv+i]; 


3 


Initialize indices 


4 


n = 0; i = 1; 


5 


Sample the signal x(t) and store values in y 


6 


while n < N do 


7 


Find the most recent change in concentration 


8 


before the current sample 


9 


while U < T + nAt do 


10 


temp = Xi ; 


11 


i = i + 1; 


12 


end while 


13 


n = n + 1; 


14 


Un = temp; 


15 


end while 


Figure 1 Pseudocode for sampling algorithm A biochemical 


signal x(f) obtained from Gillespie SSA simulation can be 


characterized using two vectors x = [x-,, x 2 , x M ] and t = [f,, f 2 , 


t M 1 


Time points f,, / = 1, 2, M, define when the concentration has 


changed and x,, i = 1,2, M, describe the concentration within the 


interval [f,-, f,+1), / = 1,2, M -1. These vectors, as well as the 


starting time for sampling (T), the number of samples (W), and the 


time step (At), are given as parameters for the algorithm. 
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Figure 2 Operation of sampling algorithm The continuous-time 
signal representing the concentration of chemical species P is 
sampled using the time step of 0.05 s (red dots) and of 0.25 s (blue 
circles). The small time step detects all details of the signal whereas 
the large time step neglects rapid changes. 



density (PSD) estimates can then be computed using the 
equation 



combines filtering and down-sampling and we highly 
recommend it. 

The most essential part of the frequency domain analy- 
sis is the estimation of the actual frequency content of 
the signal. The frequency content can be estimated using 
a wide variety of methods. The methods include, for 
example, the standard periodogram method, Blackman- 
Tukey method, autoregressive model, maximum likeli- 
hood method, etc. (for a review, see [34]). In this study, 
we use the standard periodogram method as it is straight- 
forward to implement and use. The standard periodo- 
gram approach can also be easily modified to fit for 
different kind of systems and it gives a good overall pic- 
ture of the frequency domain behavior. If a more detailed 
time-frequency representation of a chemical system is of 
interest, one should use more advanced, non-stationary 
data processing methods (for a review, see e.g. [35]). 
Although these methods are more complicated to imple- 
ment and use, they are in some cases required. The stan- 
dard periodogram method is well-applicable if we are 
dealing with a process which is approximately stationary 
(at least within a suitably short time window). 

The standard periodogram method is based on the 
discrete Fourier transform (DFT). The DFT for a finite 
discrete signal c[j], j = 0, N - 1, is mathematically 
defined by 



c k = j2 c \j] e ~ l2Kki/N ' k 



0,...,N- 1. 



(5) 



In practice, the DFT is computed using the fast Four- 
ier transform (FFT) algorithm. A weighting window (e.g. 
Blackman window, Hamming window) is often applied 
to the signal to be transformed before computing the 
DFT to prevent the bias caused by the finite length of 
the signal [36]. The actual (one-sided) power spectral 



P(fk) 



1 [ic fe i 2 + ic N _ fe i 2 ],fe = i rjj-i) 

2 (6) 



N 2 



2 



where we assume that N is always chosen to be even 
and each f k = Fk/N , k = 1, N/2, presents a positive 
frequency [37]. 

Parallel computing 

The basic idea of parallel computing is to divide a com- 
putationally intensive routine into independent subtasks 
and execute them in parallel on multiple processors [38]. 
When computationally heavy Monte Carlo procedures, 
such as the Gillespie SSA, are used, carefully implemen- 
ted parallelism can be used to reduce computing times 
significantly. In this study, we simulate the large network 
in 300 different cellular volumes. The serial execution of 
these simulations would be in practice impossible but the 
parallel simulation of all volumes can be carried out in a 
few days. We implement parallelism using the parallel 
computing platform (PC Grid) provided by Techila Tech- 
nologies Ltd. 

Results 

Simulation setup 

In this study, we simulated a large nonlinear signal 
transduction network that combines the PKC, MAPK, 
PLA2, and PLC/i networks and analyzed the simulation 
results in both the time and frequency domain. The 
reactions of the network are presented in Figure 3. 
From now on, we refer to this network as the large net- 
work. We implemented all simulation procedures as 
well as analyzed the results in MATLAB R . In order to 
run the simulations in a reasonable time, we used the 
parallel computing platform (PC Grid) provided by 
Techila Technologies Ltd. 

In the simulations, we first used the ODE model to 
determine the deterministic steady- state of the system and 
then simulated the actual noise processes around the 
steady-state in 300 different cellular volumes. As a sto- 
chastic simulation algorithm we used the Gillespie SSA. 
The simulated volumes were equidistantly spaced between 
5 x 10' 1 (comparable, for example, to the volume of the 
dentritic spine) and 10' 13 1 (comparable, for example, to 
the volume of a cell). We sampled the simulated noise 
processes using the sampling algorithm presented in 
Figure 1. The sampling frequencies were 10 - 10 Hz. 
They were chosen depending on the properties of the 
sampled signal so that at least 95 percent of the power of 
the original signal was captured. The resulting signals 
were then filtered and down-sampled to obtain the desired 
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Figure 3 Biochemical reaction network combining PKC, MAPK, PLA2, and PLC/} networks Reaction network combining PKC, MAPK, PLA2, 
and PLC/3 networks. The forward, backward, and catalyzing reaction constants are given by kf , /Cf» and k catl respectively. The species APC, 
tempPIP2, Inositol, PC, and PIP2 are treated as constant model inputs and have the concentrations 30 x 10~ 6 M, 2.5 x 10~ 6 M, 0 M, 0 M, and 2.5 
x 10~ 6 M, respectively. The other 61 species are treated as model variables. 



frequency range. In the estimation of PSDs, we simply 
used a rectangular window. The PSDs were smoothed by 
summing adjacent frequency bins. 

Time domain characteristics of noise 

Biochemical information processing occurs in various 
cell organelles, all of them having different volumes. In 
order to learn how noise changes as a function of 



volume, we simulated the large network in different 
volumes. The network includes 61 non-input chemical 
species and in order to get an overall understanding of 
the behavior of these species, we computed the coeffi- 
cients of variation and the frequencies of change for 
each species in each volume. The frequency of change 
(FC) is defined to be the number of changes in the 
molecular concentration within a certain time period. 
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These characteristics were estimated using 100 second 
realizations and they are computed over time. The 
results are shown in Figure 4. The gray areas in the heat 
maps indicate that certain species maintain either a zero 
concentration (craf"deph, GqCaPLCbcomplex, GqPLC, 
GqCaPLC, G*GTP, cRafl**) or a constant concentration 
(G*GDP) in all studied volumes. The non-active role of 
GaGDP (G*GDP, Ga is a subunit of G protein, GDP is 
guanosine diphosphate) can be explained by the fact 
that the concentration of the activated form of enzyme 
PLC/3 (GqCaPLC, PLC/3 is the /? isoform of phospholi- 
pase C) remains at zero concentration in all simulated 



volumes. This and other non-active species seem a bit 
suspicious from the biological point of view but, on the 
other hand, we have to keep in mind that this network 
is a subnetwork that has been extracted from a larger 
network and thus some parts of the network do not 
necessarily produce the natural level of activity. The 
rows (species) in Figure 4 are sorted in an ascending 
order according to the initial concentrations of species. 
It is important to notice that in the smallest volume 39 
species have a zero concentration as an initial value. In 
the largest volume, the number of initial concentrations 
equal to zero is 18. The most of the species however 
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Figure 4 Normalized coefficients of variation and normalized frequencies of change. (A) Normalized coefficient of variation (CV) and (B) 
normalized frequency of change (FC) for all model species in different volumes are illustrated by color coding (heat map). The CV describes the 
strength of noise and FC tells how many times the concentration changes on average in a given time period. The values for CV and FC were 
estimated using 100 second signals (simulated noise processes). The simulated volumes were equidistantly spaced between 5 x 10~ 16 I 
(comparable, for example, to the volume of the dentritic spine) and 10~ 13 I (comparable, for example, to the volume of a cell). The rows in the 
heat maps are sorted in an ascending order according to the initial concentrations of species. The CV and FC values are normalized by their 
maximum values. The intensities in the heat maps vary from values close to zero (dark blue) to 1 (dark red). The zero CV and FC values are 
mapped to gray points. The large network includes species that stay at zero concentration in all studied volumes (craf**deph, 
GqCaPLCbcomplex, GqPLC, GqCaPLC, G*GTP, cRafl**) as well as a species remaining at a constant concentration (G*GDP). As supposed, the 
strength of noise seems to decrease and the frequency of change increase when the volume is increased. 
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seem to be active in all volumes even though the initial 
concentration was zero. When the molecular concentra- 
tion is low, the discrete nature of chemical reactions is 
emphasized and consequently also the noise level is 
fairly high when compared to the average signal level. 
This can also be concluded from the coefficients of var- 
iation in small volumes (Figure 4A). 

To pick up an interesting species producing irregular 
behavior, we consider the active form of mitogen-acti- 
vated protein kinase (MAPK*) and discuss its behavior 
in small and intermediate volumes. In these volumes, 
the MAPK* concentration is very low or zero, as we can 
see in Figure 4. There are simulation runs during which 
the concentration stays at zero and, on the other hand, 
there are irregularly occurring realizations showing 
activity. The MAPK* thus represents some kind of non- 
stationary behavior and it would be interesting to see 
how its behavior reacts to different kinds of external sti- 
muli in different volumes. We however leave this for a 
future work and do not try to make any biological con- 
clusions based on the current results. 

In general, the strength of noise seems to decrease and 
the frequency of change increase when the volume 
increases. This of course fits well to the theory of sto- 
chastic chemical kinetics. When the numbers of mole- 
cules increase in the system, the reactions occur at a 
higher rate and stochastic effects tend to average out. 
Most of the noise processes in the model behave rela- 
tively well and the noise is attenuated when the volume 
increases. It is however important to note that the 
strength of noise changes in a different manner for differ- 
ent species. For example, the noise strength in inositol 
trisphosphate (IP3) seems to decrease much faster com- 
pared to the strength of noise in protein phosphatase 2A 
(PPhosphatase2A) concentration (Figure 4A). Similar dif- 
ferences can also be observed between different species 
when comparing how FC changes as a function of 
volume (Figure 4B). 

The molecules that are present in low numbers seem 
to produce the most irregular and unpredictable beha- 
vior. Heavy stochastic fluctuations can however be 
observed also in species that are present in higher con- 
centrations. For example, arachidonic acid (AA) and dia- 
cylglycerol (DAG) have high concentrations compared 
to the other model species but they still produce notable 
fluctuations especially in small volumes. In the reaction 
network, AA and DAG are linked to the species that are 
present in low concentrations and it is likely that the 
heavy fluctuations in small volumes are due to these 
interactions. To track down the source of these fluctua- 
tions is another interesting question that we will leave 
for further studies. 

Although we have observed irregular behavior and 
heavy fluctuations besides nicely behaving noise 



processes in the network, it seems that none of the 
model species deviates far away from the deterministic 
steady-state. This seems to tell something about the 
robustness of the network, although we did not perform 
inclusive analysis of the model dynamics (e.g. bifurcation 
analysis). Noise seems, according to our analysis, to have 
more impact on the behavior of the system when the 
system volume is small. The coefficients of variation 
computed in different volumes show that the random 
fluctuations are notably stronger in small volumes than 
in larger volumes. Therefore, the use of stochastic mod- 
eling and simulation methods is especially important if 
we are modeling biochemical systems in small volumes. 

Estimating frequency domain behavior 

The dependence between the system volume and the 
quality of noise can also be studied in the frequency 
domain. In general, the frequency domain analysis of the 
simulation results shows that most of the power in these 
noise processes is on the low frequencies. The quality of 
the PSD estimates depends somewhat on the stationarity 
of the noise processes studied. In order to illustrate the 
volume dependence of the quality of noise in the fre- 
quency domain, we have selected two model species: 
phospholipase C (PLC), and calcium phospholipase C 
complex (CaPLCcomplex). Illustrative realizations of 
these species are shown in (Figure 5A and 5B). The reali- 
zations are simulated in four different system volumes, 
5 x 1(T 16 1 (blue), 3.5 x 10' 15 1 (red), 1CT 14 1 (green), and 
1(T 13 1 (black). The realizations show how the discrete 
nature of reactions plays an important role in the smal- 
lest volume and how the strength of noise gets smaller 
when the system volume increases. The effects of noise 
are however still detectable also in the largest volume. By 
taking a look at the PLC and CaPLCcomplex noise pro- 
cesses in (Figure 5A and 5B), it is easy to conclude that 
they are somewhat different. Therefore it is interesting to 
see how their behavior differs in the frequency domain. 
We estimated PSDs of the noise in the PLC and CaPLC- 
complex concentrations using 10 second realizations 
of these processes. The frequency content of these 
noise processes as a function of volume can be seen in 
(Figure 5C and 5D). The noise in both species seems to 
have most of its power on the lower frequencies. Similar 
behavior was also observed in other species in the net- 
work. The PSDs for PLC and CaPLCcomplex seem to 
have different shapes (see Figure 5C and 5D). The noise 
in PLC realizations clearly has the dominating power on 
very low frequencies whereas the frequency content of 
noise in CaPLCcomplex realizations is distributed more 
uniformly on low frequencies. An interesting observation 
also is that the shapes of PSDs for species PLC and 
CaPLCcomplex seem to be same in all studied volumes. 
This can be seen clearly in Figure 6, where four PSDs in 
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Figure 5 Behavior of PLC and CaPLCcomplex, realizations and estimated PSDs. (A) Realizations for PLC (B) Realizations for CaPLCcomplex. 
The realizations are simulated using the Gillespie SSA in four different system volumes, 5 x 10~ 16 I (blue), 3.5 x 10~' 5 I (red), 10~ 14 I (green), and 
10~ 13 I (black). The discrete nature of reactions can be easily seen in the smallest volume. In larger volumes, the strength of noise is smaller but 
still detectable. (C) Frequency domain behavior for PLC. (D) Frequency domain behavior for CaPLCcomplex. The estimated PSDs for PLC and 
CaPLCcomplex in different volumes (5 x 1CT 16 - 10~ 13 I) are illustrated by representing the frequency content of signals by color coding (heat 
map). The PSDs were estimated using 10 second signals (simulated noise processes). The noise in both species has more power on lower 
frequencies than higher frequencies. 



different volumes for PLC and CaPLCcomplex are 
plotted using the log-log-scale. 

When low frequency noise is observed, it usually raises 
specific questions about the quality of noise. The shape 
of the PSDs in Figure 6A is similar to the kind of beha- 
vior that is typical for power law or l//*-noise processes. 
When these processes are studied in the frequency 
domain, the shape of the spectrum is determined by the 
power law f a , where /is the frequency, l/f -noise pro- 
cesses have been found in several physical systems and 
they have been extensively studied [39]. A special case of 
these noise processes is the so-called l//noise which 
cannot be characterized in the time domain. This makes 
1 //-noise extremely hard to model and identify. In our 
simulations, the Gillespie SSA which practically simulates 
a Markov model produces a behavior for the noise in 



PLC concentration that clearly resembles some kind of 
l/f -noise. Although a Markov model is not capable of 
producing pure 1 //-noise, it is still of interest to study the 
quality of the noise processes in the context of power law 
noises. The slopes of the log-log-scaled spectra in Figure 
6A are in the range [-1.8, -1.7]. A pure l//-noise process 
would have a slope of -1. This means that the behavior of 
the behavior of noise in PLC concentration is closer to 
the behavior of a random walk process (slope of -2) than 
l//-noise. It is still unclear if there exists l//-noise in real 
biochemical systems. Computational models and meth- 
ods will play a crucial role in studying this issue in the 
future. 

We conclude that the frequency domain analysis is a 
workable approach when studying noise in biochemical 
reaction networks. We showed that even the simple 
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methodology that we used in this study can be success- 
fully applied to assess these features. We propose that 
the frequency domain analysis for noise should always 
be performed when BRNs are modeled using stochastic 
approaches. 

Discussion 

Biochemical noise and computational techniques 

In this study, we investigated noise processes occurring 
in a large biochemical network. The analysis was carried 
out in both the time and frequency domain. The numeri- 
cal frequency domain analysis of this kind has been 
applied also in other simulation studies where periodic or 
quasi-periodic oscillations obtained from Gillespie SSA 
simulations have been of interest (see e.g. [25,40,41]). In 
this study, however, we concentrated on the quality of 
noise in pure noise processes instead of oscillations. Our 
results (on time and frequency domain) were in agree- 
ment with previous studies: the high-frequency noise is 
attenuated by the system structure [2], in small volumes 
discrete reaction events become more important [19], 
and when the volume is increased, the importance of 
noise slowly diminishes but does not disappear [27]. In 
addition to the previously presented results, we showed 
how the frequency content of a biochemical noise pro- 
cess changes as a function of volume. To our best knowl- 
edge, this kind of analysis has not been done before and 
we believe that our approach could be applicable in other 
studies as well. Especially it might shed light on the ques- 
tion of the quality of noise in different kinds of modeling 
approaches and it could be applied for example when 



benchmarking new approximate simulation approaches. 
To make the computational techniques used in this study 
easily accessible, we introduced a straightforward sam- 
pling algorithm for Gillespie SSA simulation results (see 
Figure 1). It is also obvious that the methods presented 
in this study are easy to implement and use. Therefore, 
the kind of analysis we present here could be carried out 
for example as a starting point for a more advanced fre- 
quency domain analysis. 

Although our emphasis was on the frequency domain 
analysis, the time domain results of our study were of 
interest as well. We noted that the noise processes simu- 
lated using the stochastic Gillespie SSA do not deviate far 
from the ODE response. This kind of behavior shows the 
robustness of the network: although the environment 
and reaction events are noisy, the network still performs 
the same task. In addition, the coefficient of variation was 
used to describe the dependency between the strength of 
noise and volume in the time domain. Although the most 
of the simulated noise processes behave rather well, the 
large network also includes noise processes representing 
more unpredictable and irregular behavior. 

Besides the methodological aspects of this study, the par- 
allel computing proved to be an indispensable technique 
when massive BRNs simulations were performed. Without 
parallel computing, the simulation of 300 different volumes 
would have been impossible and we would have had to 
content ourselves with less inclusive results. Although the 
implementation of parallelism takes time, the benefits are 
so notable that the parallelization is unquestionably worth 
doing. We believe that the application of parallel computing 
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will increase explosively in the field of computational 
systems biology and its subfields in the near future. 

Insights and future work 

The results presented in this article give new insight to 
the quality of noise in one signal transduction network. 
In addition, the methodology can in principle be applied 
to the characterization of noise processes in any other 
similar system. The methods presented in this paper are 
widely applicable because almost all biological processes 
inherently represent some kind of variability. Without a 
proper analysis it is impossible to know if noise has any 
practical meaning. When we are dealing with data pro- 
duced by computer simulations, we are able to fully con- 
trol the whole process of data production. In order to 
extract all possible information from the results, new 
methodology should be developed and applied. Fre- 
quency domain analysis is widely applied in science in 
general and, as we have shown, it can, with minor modifi- 
cations, be applied also to the analysis of Gillespie SSA 
simulation results. Frequency domain analysis is, how- 
ever, just one of the numerous ways of analyzing stochas- 
tic simulation results from a new perspective. Basically all 
kind of methods that can be used to extract information 
from time-series data are potential tools in this particular 
area and we hope that this area of research will attain 
more attention. There still exist numerous challenges in 
the analysis of noise in BRNs. Our future work includes, 
for example, the testing and development of new analysis 
methods for examination of noise in subcellular systems. 
We are especially interested in noise processes represent- 
ing 1/f-noise which we also discussed in this study. Our 
further interests include, for example, new modeling 
approaches such as non-Markovian models including 
delays and their capability of producing biologically rea- 
listic noise processes. 

Conclusions 

In this study, we discussed how noise arising from mole- 
cular interactions in biochemical reaction networks can 
be examined using simulations and numerical frequency 
domain analysis. Biochemical reaction networks form the 
basic information processing mechanisms in biological 
systems and, in order to understand these mechanisms, 
we have to understand the stochastic phenomena affect- 
ing molecular dynamics. Stochastic modeling is an 
invaluable tool in this endeavor. We implemented a sto- 
chastic model for a large, realistic biochemical reaction 
network, performed massive parallel simulations, and 
analyzed the simulation results both in the time and fre- 
quency domain. We concentrated on the characterization 
of intrinsic noise appearing in a specific network. The 
simulation results showed that there are several kinds of 
noise processes in the network, all of them representing 



different kind of low-frequency fluctuations. The fre- 
quency domain behavior of biochemical noise processes 
was presented as a function of an altering system volume. 
The low-frequency nature of the noise processes in all 
studied volumes could be deduced from the estimated 
power spectral densities. 
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